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Sensors, Smart, Objects. Device Clustered Systems 

Rapid adoption rate of digital infrastructure 
5*faster than electricity & telephony 
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What Is Big Data? 



What is Big Data? 
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Big data is the term for collection of data sets so arge and complex that it becomes difficult to process using on-hand 

database system tools or traditional data processing applications 
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Data is being generated at an alarming rate 
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Every 60 seconds 


100,000+ tweets 

695,000 + status: update 

11,000,000 + instant messages 

698,445 Google Searches 

168.000,000 + emails 

1,820 TB data created 

217+ new mobile users 
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Big Data as an Opportunity 
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Provides ways to analyze information 
quickly and make decisions 



Many more opportunities 


Many more opportunities 
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Big Data Collected by Smart Meter 
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Dn In ii collected 
in 15- Minutes 


f -\ 

Managing the large volume and velocity of information generated by short-interval reads of smart meter data can 

overwhelm existing FT resources 


96 million reads per day 
for every million meters 



Big Data generated 
by Smart Meter 
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Problem with Smart Meter Big Data 
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To manage and use this information to gain insight, utility companies must be capableof high-volume data management and advanced 

analytics designed to transform data into actionable insights. 
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Before analyzing Big Data 



MONTH 


Energy utilization and billing has 
increased 


After analyzing Big Data 




During peak-load the users 
require more energy 


During off-peak times the 
users required less energy 


Time-of-use pricing encourages cost-sawy retail like industrial heavy machines to be used at off-peak times 
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IBM Smart Meter Solution 


edureka! 


IBM offers an integrated suite of products designed to enable IT to leverage big data in a variety of ways that can 

contribute to the success of energy companies 
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Data Mining 
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Optimizing unit commitment 
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ONCOR using IBM Smart Meter Solution edureka! 



Oncor Electric Delivery has incorporate 
IBM Smart Meter service 


Instrumented 

A 

t Utilizes smart electricity meters to accurately measure 
^ the electricity usage of a household 
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Unprecedented access to detailed information about 

► 

their electricity use 
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Intelligent 

A 

t Consumers monitor and control their electricity usage 

through near-real time readings of electricity meters 



l Customers in Oncor's service territory showed last year during the company's biggest energy saver contest that by 
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Users reduced their electric usage and bills by 25 percent or more 
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Problems with Encasing Opportunity 



Problems with Big Data 
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Problem 1: Storing exponentially growing huge datasets 
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* Data generated in past 2 years is more than the previous history in total 

* By 2020, total digital data will grow to 44 Zettabytes approximately 

* By 2020, about 1.7 IVIB of new info will be created every second for every person 
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Structured 

* Organized data format + 

* Data schema is fixed 

* Ex: RDBMS data, etc. 


l Problem 2: Processing data having complex structure 
i 



Semi - Structured 

* Partial organized data 

* Lacks formal structure of a data 
model 

* Ex: XML & JSON files, etc. 


Un&tru ctured 

■ Un-organized data 

■ Unknown schema 

■ Ex: multi-media files, etc. 
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Problems with Big Data 


edureka! 


Problem 3: Processing data faster I 

I I 


The data is growing at much faster rate than 
that of disk read/write speed 


Bringing huge amount of data to computation unit 
becomes a bottleneck 
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Hadoop - Solution to Big Data Problems 
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Hadoop is a framework that allows us to store and process large data sets in parallel and distributed fashion 





HDFS 

(Storage) 


MapReduce 

(Processing) 



Allows to dump any kind of data across the 
duster 


Allows parallel processing of the data stored in 

HDFS 
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Hadoop Distributed File System 
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HDFS creates a level of abstraction over the resources, from where we can seethe whole HDFS as a single unit 


HDFS has two core components, i,e. NameNode and Data Node. 

* The NameNode is the main node that contains metadata about the data 
stored. 

* Data is stored on the Data Nodes which are commodity hardware in the 
distributed environment. 
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Hadoop Cluster 
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Storing Data (Solution) 
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Problem 1: Storing exponentially growing huge datasets 


Solution: HDFS 

■ Stora ge unit of Had oop 

■ It is a Distributed File System 

■ Divide files (input data) into smaller chunks and stores it a cross the cluster 

■ Scalable as per requirement 
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Store Different Kinds Of Data (Solution) 
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Problem 2: Storing unstructured data 





Solution: HDFS 

■ Allows to store any kind of data, be it structured, semi-structured or unstructured 

■ Follows WORM (Write Once Read Many) 

■ No schema validation is done while dumping data 
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Processing Data Faster (Solution) 
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Problem 3: Processing data faster 


Solution: Hadoop MapReduce 

■ Provides parallel processing of data present in HDFS 

■ Allows to process data locally Le. each node works with a part of data which is stored on it 
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Hadoop Ecosystem 
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Hadoop provides a scalable solution to store and process huge data sets in parallel and distributed 

fashion. 


Apache Hive is a data warehousing tool that allows us to perforin big data analytics using Hive Query 

Language which is very similar to SQL. 




( -1 

Apache Pig is a platform, used to analyze large data sets representing them as data flows. 
_> 




Apache Spark is an in-memory data processing engine that allows us to efficiently execute streaming, 
machine learning or SQL workloads and requires fast iterative access to datasets. 
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Apache HBase is a NoSQL database that allows us to store unstructured and semi - structured data 

with ease and provides real time read/write access. 
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Big Data Hadoop Certification Training 
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Big Data £< Hadoop Concepts 


H&ase and moSQlC oncepts 
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Concepts of H D FS Si 
MapReduce Framework 

Understand Hadoop 
Architecture & Setup 
Hadoop Cluster 

Schedule jobs using Oorle 

Implement best practices 
for Hadoop develop men i 

(DIO 10 

8Q 

MapReduce programs 

Understand Spark 
and Its Ecosystem 


Learn data loading 

Learn how to work 

* % 

techniques 

in RDO in Spark 


Perfo rm d ata a no lytics ( 

Work on a reaf life Project 


using Rig & Hive 

on Big Data Analytics 
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Some Big Data & Hadoop Projects @ Edureka 
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Fr ject 1 Analyze social bookmarking sites 
Indust Social Media 

Project #2: Customer Complaints Analysis 

Industry Retail 


Project #3: Tourism Data Analysis 
Industry: Tourism 
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Some Big Data & Hadoop Projects @ Edureka 
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Project Airline Data Analysis 
Industry Aviation 


Project #5: Analyze Loan Dataset 
Industry: Banking and Finance 


Project #6: Analyze Movie Ratings 
Industry: Media 
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Problems with Big Data 



Hadoop-as-a-Solution 
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Thank You 


For more information please visit our website 
www.edureka.co 











